Stochastic dynamics of model proteins on a directed graph 
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A method for reconstructing tlie energy landscape of simple polypeptidic chains is described. 
We show that we can construct an equivalent representation of the energy landscape by a suitable 
directed graph. Its topological and dynamical features are shown to yield an effective estimate of 
the time scales associated with the folding and with the equilibration processes. This conclusion is 
drawn by comparing molecular dynamics simulations at constant temperature with the dynamics 
on the graph, defined by a temperature dependent Markov process. The main advantage of the 
graph representation is that its dynamics can be naturally renormalized by collecting nodes into 
"hubs" , while redefining their connectivity. We show that both topological and dynamical properties 
are preserved by the renormalization procedure. Moreover, we obtain clear indications that the 
heteropolymers exhibit common topological properties, at variance with the homopolymer, whose 
peculiar graph structure stems from its spatial homogeneity. In order to obtain a clear distinction 
between a " fast folder" and a " slow folder" in the heteropolymers one has to look at kinetic features 
of the directed graph. We find that the average time needed to the fast folder for reaching its native 
configuration is two orders of magnitude smaller than its equilibration time, while for the bad folder 
these time scales are comparable. Accordingly, we can conclude that the strategy described in 
this paper can be successfully applied also to more realistic models, by studying their renormalized 
dynamics on the directed graph, rather than performing lengthy molecular dynamics simulations. 

PACS numbers: 87.15 A- , 87.15. hm , 82.35.Lr , 05.40.-a 
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INTRODUCTION 



Numerical simulations are quite often an effective ap- 
proach for studying the dynamical properties of sys- 
tems with many degrees of freedom. When the interac- 
tion among the constituent particles (atoms, molecules, 
monomers, etc.) are ruled by short-range forces, molecu- 
lar dynamics (MD) techniques provide useful hints for un- 
derstanding complex dynamical phenomena. The main 
practical limitation to this basic approach is the exis- 
tence of relaxation processes, whose time scales are sev- 
eral orders of magnitude longer than the typical time 
scale of the microscopic dynamics. Models of structural 
glasses exhibit such puzzling features, which are associ- 
ated with the anomalously high viscosity of these amor- 
phous materials and with the ageing phenomenon 
Something similar occurs in proteins, where the folding 
or the equilibration processes may proceed over exceed- 
ingly long time scales with respect to the microscopic 
ones 2]. In all of these cases reproducing the interest- 
ing phenomena from a microscopic description can be 
very expensive in terms of CPU time and mass storage 
memory. A possible solution to such difficulties could 
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be based on a drastic simplification of the microscopic 
model to a " coarse grained" version, yielding an effective 
representation of the dynamics in terms of macroscopic 
modes. For instance, such a strategy is usually applied 
for obtaining a hydrodynamic description, by separating 
the corresponding long time scales from the fast ones. 
The so-called "projection technique" at the basis of the 
linear-response theory by Green and Kubo ^ provides a 
suitable tool for cooking such a recipe. Unfortunately, it 
applies successfully only to a few simple models of hard- 
sphere fluids. 

A suitable strategy for tackling an effective description 
of anomalously long relaxation processes has to take into 
account that they are associated with the peculiar struc- 
ture of the energy landscape. In particular, the phase 
space is typically partitioned hierarchically into loosely 
connected regions: a trajectory may last over a very long 
time inside a subset of the available phase space, before 
finding a wayout through some "bottleneck" and enter an 
unexplored region. For temperatures small enough with 
respect to the energy barriers of the bottlenecks, the sit- 
uation may look quite similar to the phenomenon of the 
breaking of ergodicity characterizing the phase transi- 
tions of statistical mechanics. On the other hand, since 
in MD the number of degrees of freedom is finite and 
the phenomenological potentials are typically represented 
by smooth functions, one has not to deal with real sin- 
gularities and we can guess that the entire phase space 
should be explored after a sufficiently long (usually, ex- 
ceedingly long) time. It is worth stressing that this con- 
jecture seems quite plausible, despite no general rigorous 
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proof of ergodicity for such phenomenological potentials 
is available. 

On a finer spatial scale one can view the energy land- 
scape as a collection of the basins of attraction of the 
local minima of the potential energy. A basin of attrac- 
tion of a local minimum is the collection of points in 
phase space, whose overdamped dynamics converges to 
that minimum. By associating a different symbol to each 
basin of attraction one could reconstruct a dynamical 
trajectory as a sequence of such symbols, thus yielding 
a coarse-grained representation of the dynamics. At the 
boundaries between nearby basins of attractions there 
are saddles which allow a trajectory to explore different 
basins during its evolution. We guess that a suitable 
approximation of the dynamics in phase space can be 
obtained by replacing the trajectories with a sequence 
of thermally activated transitions between different local 
minima. We expect that such an approximation is effec- 
tive when the temperature of the system (expressed in 
units of the Boltzmann constant) is small enough to be 
compared with the typical height of the energy barriers 
(saddles) separating nearby basins. 

Relying upon these considerations, we assume that for 
sufficiently (but no too) small temperatures the MD can 
be effectively replaced by a stochastic dynamics defined 
onto a connected graph. The local minima, or equiv- 
alently their basins of attraction, are the nodes of this 
graph. A link between two nodes is drawn if the cor- 
responding minima are connected in phase space by a 
saddle. The strength associated with the link is given 
by the transition rate between the two coupled minima. 
This quantity can be determined by purely geometric fea- 
tures of the energy landscape as a suitable generalization 
of the Arrhcnius law to a high-dimensional space (i.e., 
Langer's formula, see Section UlCp . The directed nature 
of the graph is a straightforward consequence of the dif- 
ferent energy gaps which, in general, separate each one 
of the two minima from their connecting saddle. 

We want to stress that the equivalence between the 
MD and the stochastic coarse-grained dynamics on the 
directed graph has to be assumed valid in a statistical 
sense. Actually, a stochastic path through the graph 
should not necessarily correspond to any dynamical tra- 
jectory. Nonetheless, we guess that by averaging over 
many paths on the directed graph one obtains statistical 
inferences consistent with averages over many trajecto- 
ries generated by the MD. 

In this manuscript we test this approach by studying 
simple model proteins whose dynamical features were an- 
alyzed in a previous publication [3] . In Section |TT] we 
summarize the model and the method used for recon- 
structing minima and saddles of the energy landscape 
of such model proteins. We also show that the strat- 
egy proposed in [ij can be improved by adopting a suit- 
able search procedure for identifying shortcuts, which 
connect minima separated by large conformational dis- 
tances. Later in this section we show how one can plug a 
Markov-chain structure onto the reconstructed directed 



graph. In Section IIIII we discuss how such a represen- 
tation can be used for extracting equilibrium and non 
equilibrium properties, to be compared with MD simu- 
lations. As expected the "graph" approximation is ef- 
fective in a range of temperatures close to the folding 
one. In Section HVl we comment about the advantages of 
the "graph" approximation with respect to MD simula- 
tions. In fact, one can easily realize that a suitable recon- 
struction of the energy landscape, including a sufficient 
sampling of minima and saddles, requires a considerable 
numerical effort, comparable with MD simulations. This 
notwithstanding, a graph representation allows to obtain 
many additional information on the main features of the 
model. In particular, we argue that a renormalization 
procedure can be successfully applied in such a way to 
keep the main dynamical and statistical features asso- 
ciated with equilibrium and transient processes, while 
eliminating the unessential details associated with a large 
number of minima and saddles. Moreover, additional in- 
formations about the nature of model proteins can be 
obtained by applying standard analysis of static proper- 
ties of random directed graphs. We can conclude that 
some general static features are common to any model 
of a polypeptidic chain. The main specific signatures 
identifying a protein specimen (fast folder) seem rather 
associated to dynamical features. This is not completely 
unexpected, although quite often one can find in the liter- 
ature claims about specific static properties of the energy 
landscape as intrinsic to real proteins f6l|. Our analysis 
at least challenges this widespread belief. 



II. THE MODEL, ITS ENERGY LANDSCAPE 
AND THE DIRECTED GRAPH 



A. A simple toy-model of polypeptidic chains in 2D 

For the sake of simplicity we want to test our idea of 
approximating the thermalized dynamics of a polypep- 
tidic chain by a stochastic dynamics on a directed graph 
with a toy model, first introduced in This is a slight 
modification of the 2d off-lattice HP model originally 
proposed by Stillinger et al. in [sl]. It is defined by the 
Hamiltonian 
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is the potential energy defined by the phenomenological 
potentials 
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All the parameters are expressed in terms of adimensional 
arbitrary units: for instance, a and tq are fixed to the val- 
ues 20 and 1, respectively. The model Hamiltonian repre- 
sents a one-dimensional chain of L point-like monomers 
corresponding to the residues of a real protein. Only two 
types of residues are considered: hydrophobic, H, and 
polar, P . Accordingly, a heteropolymer is identified by 
a sequence of discrete variables (with i = 1, . . . , L) 
along the chain: = ±1 indicates that the i-th. residue is 
of type H or P, respectively. The intramolecular potential 
is composed of three terms: a stiff nearest-neighbor har- 
monic potential Vi , which keeps the bond distance almost 
constant, a three-body potential V2, which measures 
the energetic cost of local bending, and a long-range 
Lennard-Jones potential V3 acting between all pairs of 
monomers i and j such that |« — i| > 1. The monomers 
are assumed to have the same unitary mass. The space 
coordinates of the i-th monomer are — {xi,yi) and 
their conjugated momenta are = {px,i,Py,i) = [ii, Vi) ■ 
The variable r,;.j — ^ (xi — XjY + {yi — yjY is the dis- 
tance between i-th and j-th monomer and 9i is the bond 
angle at the i-th monomer. V3 is the only contribution 
that depends on the nature of the monomers. The coef- 
ficients Cij = ^{1 + £,i + £,j -\- are defined in such 
a way that the interaction is attractive if both residues 
are either hydrophobic or polar (with Cij = 1 and 1/2, 
respectively), while it is repulsive if the residues belong 
to different species (cy = —1/2). 

Here, we focus our investigation on three sequences 
of twenty monomers that represent the three classes of 
different folding behaviors observed in this model: 

• [SO] a homopolymcr composed of 20 H residues; 

• [S1] = [HHHP HHHP HHHP PHHP PHHH] a se- 
uence that has been identified as a fast-folder in 
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• [S2] = [PPPH HPHH HHHH HHHP HHPH] a ran- 
domly generated sequence, that has been identified 
as a slow-folder in [3]. 

The three characteristic temperature Tg, Tf, Tg of each 
sequence determined in 10] by MD simulations, where 
the chains are in contact with a Langevin heat reservoir, 
are reported in Table HI 

Several distances can be defined in order to distin- 
guish between two configurations Ci and C2 of a two- 
dimensional chain. A particularly simple one is the an- 





SO 


SI 


S2 


Te 


0.16 


0.11 


0.13 


Tf 


0.044 


0.061 


0.044 


T, 


0.022 


0.048 


0.025 


no 


31 


37 


36 


V{Q) 


-7.04 


-4.67 


-4.67 



TABLE I: The collapse transition temperature Te, the folding 
temperature Tj, the "glassy" temperature Tg, the number 
no of nearest neighbor minima of the native configuration 
for the sequences SO (homopolymer), SI (fast-folder) and S2 
(slow-folder). V{Q) is the minimum of the potential energy, 
corresponding to the native state. 
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gular distance 

dg{C^,C2) - J2 l^'(^i) - ^'(^2)1, (5) 

where 



n=l 



^ (qi - qi-i-i)-(qi+i - qi-i-2) 

is the i-th backbone angle of configuration C, represented 
by the coordinates q^. 



B. Reconstruction of the energy landscape 

A suitable reconstruction of the energy landscape can 
be obtained by effective computational strategies for 
identifying the local minima of the potential energy 
and the first order saddles connecting them in a high- 
dimensional configuration space. A first extensive search 
for minima can be performed by sampling a sufficiently 
large set of MD trajectories at a given temperature T. 
This is fixed by a Langevin heat bath acting on all the de- 
grees of freedom of Hamiltonian ([T]) . Since one is usually 
interested in exploring the effects of the energy landscape 
on the folding process of polypeptidic chains, it is conve- 
nient sampling trajectories at fixed time intervals close 
to the folding temperature Tf . Each sample is a dynam- 
ical configuration, which is used as the initial condition 
for the overdamped dynamics (see [3] ) : it eventually con- 
verges to the local minimum, whose basin of attraction 
contains the sampled configuration. As shown in Q a 
large number of minima of the energy landscape of se- 
quences SO, SI and S2 can be identified by performing 
0(10"^) Langevin trajectories at T = 0.1 and sampling 
them at a time pace At = 0.1 in the natural time units 
of the model. The strength of the coupling with the heat 
bath is given by the dissipation rate 7 = 7: this value, ex- 
pressed in the adimensional units of the model, has been 
estimated from the knowledge of the relaxation rate of 
an aminoacid in a solvent (typically, water) p^ . 

Once this preliminary set of minima has been pro- 
duced, one is interested in determining the pairs of min- 
ima which are connected through a first order saddle in 
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the landscape. In fact, for potentials of class or higher, 
as those contained in Hamiltonian ([1]), the minimal en- 
ergy path connecting two minima, usually passes through 
a first order saddle of the energy landscape [III ■ 

In the last decade various methods have been proposed 
for d esig nin g an efficient algorithm for the search of sad- 
dles jl2l. [l3LTl3|. Unfortunately, none of these statistical 
approaches provides the detailed reconstruction of the 
energy landscape to allow for an effective representation 
of the corresponding directed graph. As shown in one 
can take advantage from a metric criterion for identify- 
ing such pairs of minima: they are typically separated by 
an angular distance dg smaller than the threshold value 
dg'"' = 0.2 (see On the other hand, despite being 

quite effective, one cannot expect that this criterion can 
identify all the relevant saddles involved in the folding 
process. Actually, it has been observed that the energy 
landscapes of sequences SI and S2 contain a relatively 
small set of first order saddles connecting pairs of min- 
ima, whose dg is definitely larger than d^'"'. In order to 
avoid missing such saddles, one can use a more refined 
strategy (see ^4|) . The identified minima are taken as 
initial conditions of the Langevin dynamics at T — Tf. 
The dynamics evolves for a time interval At = 10"'^ in 
the natural time units of the model. The overdamped 
dynamics is then applied starting from the final con- 
figuration [2l| : if it converges to the starting minimum 
one goes back to the final configuration and makes the 
Langevin dynamics further evolve for a time interval At . 
The algorithm is iterated until the overdamped dynamics 
converges to a minimum different from the starting one. 
The new identified minimum is added to the database of 
minima, if it had not been previously recorded. Accord- 
ingly, the database of saddles is also updated by adding 
a connection between the starting minimum and the new 
one. This procedure has been repeated ten times from 
each minimum. 

Finally, the first order saddles have been identified by 
applying the same procedure proposed in Here, we 
just want to recall that it is based on an iterative al- 
gorithm, which exploits the steepest-descent dynamics 
generated close to the ridge, separating the basins of at- 
traction of two connected minima. 

The number of minima N and saddles S obtained for 
each sequence are reported in the first two lines of Table 
im We have also selected the subsets of minima and sad- 
dles, whose potential energy is contained in between the 
potential energy of the native state, Vq (see Table HI)) and 
the "folding energy" Ej = Vo + l/2kBTf{2L-3) . We ex- 
pect that these subsets should contain the main elements 
associated with the folding process. Their numbers, in- 
dicated by Nf and Sf, respectively, are also reported in 
Table [ill Notice that the fraction of minima and sad- 
dles below Ef reduces much more for the heteropolymer 
sequences (SI and S2) than for the homopolymer (SO). 
This is a first indication that the energy landscape close 
to the native valley exhibits quantitative differences for 
different sequences. 
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oz 


N 


180156 


87580 


110524 


S 


349197 


213219 


304303 


Nf 


99797 


17726 


35852 


Sf 


276958 


85014 


150809 



TABLE II: Number of minima, N, and saddles, S, for 
the three analyzed sequences. The number of minima, Nf 
and saddles, Sf, below the folding energy Ef = Vo + 
l/2KBTf{2L - 3) is also reported. 



C. The directed graph 

A sufhciently rich database containing the minima and 
the first order saddles can be used for constructing the 
corresponding directed graph. Each node of the graph 
corresponds to one minimum in the database (or, equiv- 
alcntly, to its basin of attraction). In what follows we 
assume that the N nodes are ordered for increasing val- 
ues of their potential energy and assign them an index i 
that runs from 1 to N. Accordingly i — 1 corresponds to 
the so-called native state. A link is traced in the graph 
between node i and node j if the connection database 
contains a first order saddle Sij connecting them. We 
associate to this link its " strength" , determined by the 
rate of transition from i to j, Fjj . This quantity can be 
approximated, for sufficiently low temperatures, by the 
Langer estimate [l5|. It generalizes the usual Arrhenius 
formula by including the entropic factors associated with 
the curvatures of minima and saddles in a high dimen- 
sional space: 

n^::^^-^L- H" ) ■ 

(7) 

The Wj-'^'' are the L' ~ 2L ~ 3 nonzero eigen-frequencies 

(k) 

of the minimum associated with node i, the u}j_ \ j are the 
L' — 1 nonzero frequencies corresponding to the contract- 
ing directions of Sij, while wn i j is the frequency associ- 
ated with the only expanding direction of Sij . The dis- 
sipation rate 7 is the same used for defining the Langevin 
dynamics mentioned in the previous Section. The expo- 
nential factor in ([7]) depends on the height of the energy 
barrier V{si_j) — V{i), where we have used the simple 
notation V{sij) and V{i) for the values of the potential 
energy at saddle Sij and at node i, respectively. No- 
tice that the Langer estimate relies upon the assumption 
that at temperature T the rate of transition between two 
nodes (i.e. minima of the potential energy) i and j is de- 
termined by the potential energy and by the curvature of 
the highest-energy point (i.e., the saddle Sij) along the 
minimal energy path connecting i to j. 

We expect that a good estimate of the link strength 
Tij is crucial for obtaining a suitable reconstruction of 
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the folding process. For instance, it has been shown in 
[^] that, for fast-folding heteropolymers like SI, confor- 
mationally distant minima may be connected by saddles 
with relatively high transition rates. This observation 
nicely fits with the image that the wandering of a fast- 
folder in its energy landscape close or below Tf is sig- 
nificantly biased towards its native state by "shortcuts" 
connecting quite different configurations with native-like 
ones. 

We conjecture that MD simulations at temperature T 
can be effectively replaced by a Markov process on the 
directed graph, where in general Tij ^ Tj^i. 

This directed graph representation of the energy land- 
scape naturally yields a mathematical description in 
terms of a nonsymmetric N x N connectivity matrix F, 
whose nonzero elements F^.j are defined in ((T]). Notice 
that, by definition, Ti^i = Vi . 

The master equation which defines the time evolution 
of the probability Pi (t) that the polymer is in node i at 
time t reads: 



dm) 

dt 



N N 

^P,(t)F,-,-P,(O^F,,. 



(8) 



i=i 



This master equation can be cast into matrix form: 



dm 

dt 



-WP{t) 



(9) 



where P{t) is the vector of dimension at time t, whose 
elements are the Pi (t) , while the entries of the Laplacian 
matrix W are given by the expression 



N 



(10) 



k=l 



W is a nonsymmetric real matrix with positive diagonal 
elements and whose rows and columns sum up to zero: 
according to Gershgorin's theorem jT^, all its eigenval- 
ues Ti , i = I,-- - ,iV, are real and positive, apart the 
null eigenvalue, ri — (usually, the r^'s are listed in in- 
creasing order with the index i). We denote with w^*) 
the corresponding eigenvectors. 

A computational gain for the numerical diagonaliza- 
tion of W can be obtained by transforming W to the 
symmetric matrix W — T^^WT where 



T = 







V 







"at 



(11) 



Actually, it can be shown [l7| that this is a consequence 
of the validity of the detailed balance conditions, which 
hold for dynamics ([5]). In fact, the eigenvector w^^^ repre- 
sents the stationary probability measure on the directed 
graph, since it is the solution of the equation P = 0. 



In particular the components of w^^^ are given by the 
following expression: 



w. 



(1) 



n 



fc=i ' 



,(fe)' 



(12) 



where a is a suitable normalization constant, such that 

J2iLi ^i^^ — 1- Notice that is the stationary proba- 
bility for the polymer to be at node i and its expression 
corresponds to the statistical weight of thermodynamic 
equilibrium conditions in the harmonic approximation of 
minimum (node) i. More generally, by combining (jl2p 
and ([7]) it can be easily verified that for each pair of con- 
nected nodes detailed balance conditions hold: 



(i)t- (i)t- 



(13) 



so that w*-^-* contains all the information concerning the 
thermodynamic equilibrium conditions for the directed 
graph. 

The nonzero eigenvalues of VF, r^, i = 2, • ■ • , iV, rep- 
resent the relaxation rates to equilibrium of the corre- 
sponding eigenvectors w*-*^ . Given any initial probability 
distribution on the directed graph P(0) = X^feLi CfeW^*^^ 
(with Cfc real with the normalization condition ci — 1, 
see Appendix |^ , the value it will take in i at time t is 
given by the expression: 



N 



Ck w. 



(k) 



(14) 



k=l 



This expression stems from the orthogonality of the 
eigenvectors w^*^ (see Appendix . One can easily re- 
alize that in the limit t — > c» — > w'^p, i.e. any P(0) 
will eventually evolve to thermodynamic equilibrium, as 
expected. 

Another consequence of is that the stationary 
probability flux 



V ■ — 



■'I! ij 



n 



1 ,(fc) 
fc=i 



M2^ 
) 



(15) 



depends only on the energy and on the curvature of s^j-. 

Let us conclude this section by observing that inter- 
esting topological properties of the directed graph can be 
studied by introducing the "discrete" connectivity matrix 
Fd, where all the nonzero elements of matrix F are set to 
1. By replacing Fd with F in (fTO|) . one can define the "dis- 
crete" Laplacian matrix Wd- As we are going to discuss 
in the following sections, this matrix contains relevant 
information about the graph structure: the power-law 
behavior of the low-frequency component of its spectral 
density determines the spectral dimension of the graph 
[iS^ . This extends the concept of Euclidean dimension to 
graphs which are not defined on a regular lattice. 
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III. COMPARISON BETWEEN MD AND THE 
MARKOV CHAIN ON THE DIRECTED GRAPH 

In this section we want to check if the stochastic dy- 
namics defined by the Laplacian matrix W on the di- 
rected graph is consistent with MD simulations, at least 
for values of the temperature T close to Tf. Despite 
the many approximations introduced in the procedure for 
identifying minima and saddles and in the estimate of the 
transition rates T^ j, we expect that a sufficiently detailed 
reconstruction of the directed graph can yield statistical 
results in quantitative agreement with averages over MD 
trajectories. 

A first simple test can be performed on the expecta- 
tion values of equilibrium properties. These can be ana- 
lytically computed on the graph through the stationary 
probability vector w'^' defined in ea. p^ . Since equilib- 
rium properties depend only on the identified minima, 
this test can provide a quantitative verification of the re- 
liability of the algorithm used for locating them in the en- 
ergy landscape. In particular, we have compared equilib- 
rium MD estimates of the folding temperatures Tf of the 
considered sequences with the same quantities computed 
by the equilibrium probability distribution, represented 
by the eigenvector w^^\ In Fig[T]we plot the equilibrium 
probability Pf that a sequence is in the native state or in 
the set of minima directly connected with it as a function 
of temperature T. In practice this amounts to measure 
the fraction of " folded" sequences at a given T. We apply 
the same criterion used in equilibrium MD simulations 
(see Q): the value of Tf is determined by assuming that 
at this temperature 50% of the polymer configurations 
are in the "folded" state. The results obtained with MD 
and with the directed graph representation are reported 
in table mil The quantitative agreement is very good for 
SI and worsens for SO and S2. Anyway, some relevant 
features are reasonably reproduced by the graph calcula- 
tion: both SO and S2 have similar values of Tf , smaller 
than the values obtained for SI. 




0.02 0.04 0.06 0.08 0.1 



T 

FIG. 1: Folding probability Pf estimated from Ea. (|12|l as a 
function of temperature T. The full, dot-dashed and dashed 
lines correspond to SO, S2 and SI, respectively. 

These results confirm that equilibrium properties of 
the directed graph are quantitatively consistent with MD 





MD 


graph 


SO 


0.044 


0.026 


SI 


0.061 


0.066 


S2 


0.044 


0.033 



TABLE III: folding temperatures as computed by means of 
MD simulations (table |lj and of the analytical expression of 
the stationary distribution on the connectivity graph. 

predictions. 

The comparison between MD and the graph dynamics 
(GD) concerns dynamical indicators. We report hereafter 
the results of numerical simulations for the average exit 
time from a region in the landscape and for the first pas- 
sage time from the native state. The former aims at ver- 
ifying the conjecture that MD corresponds to a sequence 
of thermally activated transitions through the nodes of 
the directed graph; the latter amounts to obtain a quan- 
titative estimate of the time scale involved in the folding 
process. 

GD is performed by assigning the average time spent 
at node i as follows [22| : 

(U) = . (16) 

The index k runs over all the nodes directly connected 
with i. The probability of moving from node i to node j 
is given by the expression 

n.M-r,,,(i,) (17) 

Accordingly, a trajectory on the directed graph 
can be represented by an ordered set of symbols 
(i, j, fc, • ■ • , m, n) labeling the sequence of visited nodes: 
its time duration is given by the expression 



^ = E(^a) ■ (18) 

CK — 2 

Notice that the time scale which establishes the cor- 
respondence between MD and GD is the inverse of the 
dissipation rate 7 (see Section Hi Bl and cq.Q) . 

In Table HVl we report for two temperatures, T — Tg — 
0.4 and T = Tf — 0.6, the average exit time for sequence 
SI from the native minimum, from the first shell (i.e., the 
set of the 66 minima directly connected with the native 
one) and from the set of minima A^, whose angular dis- 
tance dg from the native state is smaller than 0.4. The 
latter set contains 2341 minima, including the first shell. 

MD averages have been performed over 10^ trajectories 
starting from each minimum in the considered set, while 
GD averaged have been performed over 10^ stochastic 
paths. The data exhibits a reasonable agreement, if one 
considers that Langer's formula is known to systemati- 
cally overestimate the actual transition rates F^^ [isj . 
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Measurements of the first passage time of SI through 
its native minimum have also been performed by aver- 
aging over 10^ MD trajectories and over 10'' GD paths. 
We have considered three different classes of initial con- 
ditions: the minima in the first shell, the set A4, and the 
larger set of minima TV, whose potential energy is smaller 
than ViO) + l/2KBTfL' = -3.45 ( 17726 minima). For 
T = 0.04 we find a good quantitative agreement between 
MD and GD (see Table |V|, while for T = 0.06 the first 
passage time on the the graph is much smaller then the 
MD predictions. Apart the approximation due to the 
Langer's estimate of the transition rate, the main rea- 
son for this discrepancy has to be attributed to the poor 
reconstruction of the energy landscape for high values 
of the potential energy. In fact, we have checked that 
by considering the subset of trajectories which go to the 
native minimum without escaping from the set of initial 
minima, average first passage times converge to much 
closer values. This implies that MD visits regions of the 
energy landscape that are scarcely accessible to the graph 
dynamics. Only for temperatures smaller than Tf this ef- 
fect is highly reduced and MD trajectories tend to remain 
confined in a smaller portion of the energy landscape, 
which is more accurately represented in the graph. We 
want to point out that a finer sampling of the energy 
landscape would demand a sensible computational effort 
even for producing a small improvement. 

Performing similar kinds of measurements for SO and 
S2 at T = 0.04 is practically unfeasible for MD simula- 
tions, since the average exit and folding times increase 
at least of two orders of magnitude with respect to SI. 
Measurements of the average folding time at T = 0.06 
starting from the set of minima with potential energy 
below l/2KBTfL' give 1.3 x 10^ for MD and 9.8 x 10^ 
for GD. Since the quantitative agreement is maintained 
we can reasonably conjecture that it should hold also 
at lower temperatures. Accordingly, we have estimated 
by GD the average first passage time of SO and S2 at 
T = 0.04. In both cases we have found values three or- 
ders of magnitude larger than those obtained for SI. We 
can conclude that, at variance with MD, the graph dy- 
namics allows to estimate the average folding time even 
for T < Tf, thus providing a clear characterization of a 
fast folder with respect to generic polymers. 



IV. RENORMALIZATION OF THE DIRECTED 
GRAPH 

As we have shown in the previous section MD and GD 
exhibit a reasonable quantitative agreement, while nu- 
merical simulations of the latter are faster than the for- 
mer. On the other hand, a considerable computational 
price has been payed for obtaining a suitable reconstruc- 
tion of the energy landscape. The very advantage of the 
directed graph representation stems from the possibility 
of applying an effective renormalization procedure. In 
fact, at a given temperature T the directed graph can 



From the native state 




T=0.04 


T=0.06 


MD 


4193 


164 


GD 


3509 


85 


From the 1st shell 




T=0.04 


T=0.06 


MD 


23459 


446 


GD 


13077 


243 


From the set M 




T=0.04 


T=0.06 


MD 


326855 


2052 


GD 


107775 


704 



TABLE IV: Comparison of average escape times of SI from 
three different sets of initial conditions computed by MD and 
GD simulations at temperatures T = 0.04 and T — 0.06. The 
first shell contains 66 minima, while the set A4 contains 2341 
minima. 



From the 1st shell 




T=0.04 


T=0.06 


MD 


5746 


1153 


GD 


4307 


461 


From the set M 




T=0.04 


T=0.06 


MD 


8535 


4120 


GD 


4434 


512 


From the set f\f 




T=0.04 


T=0.06 


MD 


6947 


5759 


GD 


4566 


583 



TABLE V: Comparison of average first passage times at the 
native state from three difi'erent sets of initial conditions com- 
puted by MD and GD simulations at temperatures T — 0.04 
and T = 0.06. The first shell contains 66 minima, the set M 
contains 2341 minima and the set A/" contains 17726 minima 



be renormalized by eliminating all those links (saddles) 
corresponding to energy barriers smaller than KbT. In 
practice, the procedure can be performed by reducing the 
pair of nodes connected by such a barrier to a single node, 
whose connectivity is renormalized accordingly. More 
precisely, the renormalization algorithm goes through the 
following steps. 

• For any pair of connected nodes i and j we compute 

AV,,,^V{s,,,)~VU) (19) 
ifV{j)>V{z); 

• the AVij- are listed in increasing order and we 
choose the subset 
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S = {AV^,j : AV^^J < KbT} 



(20) 



• starting from the smallest element of S the graph 
is renormalized by first assimilating node j with 
node i: the renormalized equilibrium eigenvector 
w^^^ and the renormalized rates Tk,j are defined as 
follows: 



- w 



(1) 



Tfej if Tfcj = 
Tfe.i + T/cj otherwise 



(21) 



(22) 



and 



if r 



k,i 



Tifc — 



otherwise 



(23) 



• then we pass to the second lowest value in the set 
E and proceed sequentially until all the elements of 
this set are renormalized. 

The renormalized rates Tij define a renormalized evo- 
lution matrix W, whose equilibrium eigenvector is tJ)'-^-' . 
It can be easily argued that detailed balance condi- 
tions (|13p are maintained for the renormalized quantities. 
Moreover, since it only involves node removal, the proce- 
dure just outlined maintains the original ordering of the 
graph nodes. 

The renormalization procedure transforms also the dis- 
crete connectivity matrix and the discrete Laplacian 
matrix Wd to Fd and Wd, respectively. In the following 
section we shall first discuss how the topological proper- 
ties of the renormalized directed graph depend on tem- 
perature. Afterwards, the corresponding dynamical fea- 
tures will be analyzed. 



A. Topological properties of the renormalized 
connectivity graph 

The renormalization procedure presented in Sect ion ITVl 
depends on the temperature and here we want to ana- 
lyze how it can change the topological properties of the 
directed graph. 

In Fig. ^ we show how the number of nodes N and 
the number of connections per node a = S/N change 
with the temperature, for the three sequences defined 
in Section |TT1 Temperatures are varied in the range 
< T < 0.08, thus encompassing both the glassy and 



the folding temperatures of all sequences (see Table I). 
The homopolymer SO exhibits a very slow exponential 
decay of N, while a slightly increases. Conversely, for 
SI and S2, N reduces by more than one order of mag- 
nitude, while a uniformly decreases beyond T — 0.02 . 
These results stress the topological differences between 
the graph representations of the homopolymer SO and of 
the heteropolymers SI and S2. In the explored range of 
temperatures the graph of SO is poorly affected by renor- 
malization. This is due to the peculiar structure of its 
energy landscape: most barrier heights separating local 
minima from their neighbors are deep enough to keep 
their role of "hubs" in the renormalized graph, while a 
relatively small fraction of minima are absorbed by the 
hubs, thus slightly increasing the average connectivity. 

In the graph representation of SI and S2 the number 
of hubs is significantly reduced, because an increasing 
number of barrier heights is renormalized as tempera- 
ture increases. Moreover, apart an initial increase up 
to 7, cr drops to much smaller values. The main reason 
for this sensible reduction of the average connectivity is 
essentially due to the fact that most of the hubs share 
with the renormalized nodes connections with the same 
set of neighboring nodes, at variance with SO, where the 
connectivity is much more sparse. A simple argument 
accounts for these opposite mechanisms. Let us suppose 
that node j, with connectivity Sj, is renormalized to node 
i, with connectivity s;. The connectivity of the hub i 
becomes equal to Si = st + Sj — c ^ 2 if c is the num- 
ber of connections with neighboring nodes common to 
i and j. There are two extreme situations: if c = 
(no common connections) then Si = Si + Sj — 2, while if 
c = Si — 1 = Sj — 1 {i and j have the same neighbors) 
then Si = Si — 1. This argument indicates that above 
T = 0.02 many nodes in the graphs of SI and S2 are 
highly connected among themselves so that renormaliza- 
tion acts as an overall reduction of the average connectiv- 
ity. More precisely, only fewer and fewer hubs maintain a 
high connectivity, while the great majority assumes the 
role of peripheral nodes. This picture is also confirmed 
by looking at the distribution of the renormalized average 
connectivity a at different temperatures: it is practically 
unmodified for SO, while for SI and S2 it tends to be 
sharply peaked at 1 as T increases. In Fig.s ^ , ^ and 
dS]) we report the fraction of nodes with connectivity a, 
v(cr) = N{a)/N, versus a for different values of T. No- 
tice that both N{a) and N vary with the renormalization 
temperature . 

A special role in the renormalization procedure is 
played by the "native" node, which becomes the main 
hub of the network, since it collects an increasing num- 
ber of renormalized nodes as T increases. This effect is 
significantly more pronounced for SI and S2 with respect 
to SO: for instance, at T = 0.08 ctq - 10^ for SI and S2, 
while do 10^ for SO . 

The topological properties of the directed graphs con- 
sidered in this section can be also described by determin- 
ing their spectral dimension d [isj . In an infinite graph 



FIG. 2: Number of nodes A'^ (a) and average connectivity a 
(b) of the renormalized graph versus temperature T for SO 
(empty circles), SI (crosses) and S2 (filled squares). In (a) 
the dashed lines are exponential fits of the data, due to the 
adopted log-lin representation. The full lines in (b) are drawn 
to guide the eyes. 
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FIG. 4: Sequence SI: the fraction of nodes with connectivity 
(7, i^(o-), in log-log scale at T = (full line), T = 0.04 (dashed 
line) and T = 0.08 (dot-dashed hne). 
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FIG. 3: Sequence SO: the fraction of nodes with connectivity 
a, u{a), in log-log scale at T = (full line), T = 0.04 (dashed 
line) and T = 0.08 (dot-dashed line). 



FIG. 5: Sequence S2: the fraction of nodes with connectivity 
a, i^(cr), in log-log scale at T = (full line), T = 0.04 (dashed 
line) and T = 0.08 (dot-dashed hne). 



this quantity is defined by the formula 

R{uj) - oj^ (24) 

where R{uj) is the integrated density of harmonic vibra- 
tional modes with frequency uj. It is worth recaUing 
that in our model we are dealing with the spectrum of 
the discrete Laplacian matrix Wd, with finite rank N 
(see Section IIIC|) . If we denote its eigenvalues with 
Ai < A2 < • ■ • < Ak < ■ • • < An we can define the in- 
tegrated density of eigenvalues i?(Ak). In this case, we 

can reasonably assume that, if the rank N of Wd is suffi- 

1/2 

ciently large, by identifying lu with uj^ — Aj, the spectral 
dimension can be estimated through the approximate re- 
lation 

^(A.) - \T (25) 

In Fig. [HI we show that one obtains close estimates 
of the spectral dimension (d « 6.5) for the connectivity 
graphs of all the three sequences considered in this paper. 
In an infinite graph the application of the renormaliza- 
tion procedure described in Section lTVl cannot change the 



value of d 18]. This is what happens also for the finite 
graph representing the homopolymer SO: Fig. [7] shows 
that, independently of the temperature T, the value of d 
in the renormalized graph is constant. Conversely, for the 
heteropolymers SI and S2 the value of d decreases to a 
value close to 5 for T > 0.02 (see Fig. |S1 where we report 
the data for sequence SI, which exhibits the same behav- 
ior of sequence for S2). This indicates that, beyond this 
temperature, the renormalization procedure modifies the 
topological properties of the finite directed graphs: re- 
gions of high connectivity collapse into big hubs, while in 
the renormalized graph the average connectivity is sig- 
nificantly reduced. This is consistent with what is shown 
in Fig. m . 

We can conclude that all topological indicators ana- 
lyzed in this section allow to distinguish between the ho- 
mopolymer SO and the other heteropolymers. On the 
other hand, no significant difference between SI and S2 
can be identified. As we have discussed in Section Hill this 
seems to emerge only by considering dynamical proper- 
ties of the directed graphs. In particular, in what follows 
we are going to show that this different dynamical fea- 
tures are maintained also in the renormalized graphs. 
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0.01 0.1 

A, 

FIG. 6: Log-log plot of the spectrum of eigenvalues of the dis- 
crete Laplacian matrix Wd of the zero-temperature connec- 
tivity graph of SO (empty circles), SI (crosses) and S2 (filled 
squares). The dashed line refer to power-law with exponents 
3.3 . 

100 r \ \ \ \ n 




0.01 

X 

FIG. 7: Log-log plot of the spectrum of eigenvalues of the 
discrete Laplacian matrix Wa of SO for five different tempera- 
tures: T = 0.00, 0.02, 0.04, 0.06, 0.08. We have used the same 
symbols independently of temperature to point out that data 
collapse on the same curve: the full line corresponds to the 
power-law fit with exponent 3.1. . 



B. Relaxation times and the low frequency 
spectrum of the Laplacian matrix 

One can easily realize that obtaining a complete char- 
acterization of the spectrum of the Laplacian matrix W 
defined in (jlOp demands the diagonalization of a matrix 
whose rank is 0(10^) (see Table lll| . This is practically 
unfeasible. On the other hand, we are at least interested 
in reconstructing the part of the spectrum of W contain- 
ing the smallest eigenvalues rfc, k = 2, 3, • • • , which are 
related to the largest equilibration time scales. This task 
can be accomplished by using Lanczos-like algorithms of 
the ARPACK Library [19,]. 

We have first checked that the lowest part of the spec- 
trum of W and of its renormalized version W are con- 
sistent. In fact, since the renormalization procedure de- 
scribed in the previous section amounts to a series of 
local transformations on the directed graph, we expect 
that the first nonzero eigenvalues should not vary. The 
corresponding eigenvectors, w^^^ ^ fc = 2,3, • • ■ should as 
well keep their " structure" , modulo the renormalization 
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0.01 0.1 

FIG. 8: Log-log plot of the spectrum of eigenvalues of the 
discrete Laplacian matrix of SI for five different tem- 
peratures: T=0.00 (O), 0.02 (-h), 0.04 (□), 0.06 (A), 0.08 
(x). The data obtained for different temperatures are shifted 
horizontally by an arbitrary constant factor in order to ob- 
tain a better view. Notice that the power-law fit passes from 
its maximum value, 3.4 at T = to a minimum value of 
2.5 above T = 0.02. 

procedure, which changes their dimension (see Section 
IIV[) . Such a comparison has been performed for the di- 
rected graph of SI at T = 0.08 [H. Table IVll reports 
the rank of W at various temperatures for the three se- 
quences investigated. 



T 


SO 


SI 


S2 


0.02 


138791 


41139 


38997 


0.04 


130326 


20537 


26292 


0.06 


117328 


9430 


15943 


0.08 


102933 


4576 


9736 



TABLE VI: Rank of the renormalized Laplacian matrix W 
the three investigated sequences at various temperatures be- 
low To. 

Relying upon this analysis, we can assume that the in- 
teresting spectral properties of W can be extracted from 
W (by exploiting also the symmetrization procedure de- 
scribed in Section [II C|l . In fact, if we consider lower val- 
ues of the temperature, i.e. T = 0.06 and T = 0.04, the 
rank of W reduces for SI to 9430 and 4576, respectively 
(see Table IVI)) . On the other hand, a direct compari- 
son with the spectrum of W would demand a too high 
computational cost: at lower temperatures the relative 
separation between the eigenvalues at the lower end of 
the spectrum quickly approaches machine precision thus 
dramatically slowing down the convergence rate of the 
diagonalization algorithm [l9j . 

In Fig.® we show the dependence of the smallest non- 
zero eigenvalues of W on the temperature T for sequences 
SI and S2 . In both cases we obtain evidence of an 
Arrhenius-like behavior 

rfc ~ Aexp(-B/ifBT). (26) 

where A and B are suitable constants, which depend on 
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FIG. 9: The eigenvalues r^, for k — 2,3,4 SI versus temper- 
ature T in log-reciprocal scale for SI (a) and S2 (b) . The 
dashed lines are exponential fits to the data. 
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FIG. 11: Components of to'^' and absolute value of the 
components of w'^', lo^^-* and w''*' for the fast-folder SI at 
T = 0.04. 



the sequence and on k. 

The corresponding eigenvectors are shown in Fig.s (fTO|) 
and PT|) . To ease reading we report the absolute value 
of the components of eigenvectors w*-^-*, w^'^^ and w^^\ 
since, as shown in Appendix \^ their components sum 
to and therefore fluctuate wildly between positive and 
negative values. The eigenvectors w^^-* is also reported for 
comparison. We recall that the eigenvector components 
are ordered according to the energy of the corresponding 
graph node. All the eigenvectors of S2 are localized on 
some node of the graph, as well as the w'-^-' eigenvector of 
SI. The other eigenvectors of SI are instead delocalized. 




FIG. 10: Components of ui'^^ and absolute value of the com- 
ponents of w^^\ ui'^' and ui''*^ for the slow-folder S2 at 
r = 0.06. 

We have verified that, for the localized eigenvectors, 
the energy B amounts to the height of the lowest en- 
ergy barrier separating the localized node from the graph. 
Nonetheless this energy barrier is quite high and, in this 
sense, the node can be viewed as a kinetic trap of GD. 
We have also found that most of the first 50 non-zero 
eigenvectors of S2 are localized as those shown in Fig. 
(fTO| . Accordingly, this suggest that a great deal of nodes 
play the role of a kinetic trap, thus slowing-down any 
dynamical process on the graph. 

For what concerns the delocalized eigenvectors of SI, 
B can be interpreted as an effective energy barrier sep- 



arating different subsets of nodes in the graph. This 
interpretation can be supported by the following argu- 
ment. Let us suppose that a graph is subdivided into two 
subgraphs A and B, weakly connected between them- 
selves. By dividing each component by w^'' one 
obtains a "normalized" eigenvector w^^\ whose compo- 
nents are split into two sets of values (see AppendixiBjfor 
details) , which identify the nodes contained into the two 
subgraphs. The normalized eigenvectors w^^') and w(^) , 
shown in the upper panels of Fig. (|12p . exhibit a simi- 
lar scenario, although they are split into more than two 
sets of values. This indicates that the graph structure 
of SI is more intricated than in the example discussed 
in Appendix iBl Nonetheless, the interpretation of B as 
an effective barrier height separating different regions of 
the graph remains valid. Notice that the correspond- 
ing eigenvalues and are more than four orders of 
magnitude larger than . This shows that the perturba- 
tive argument presented in the Appendix provides only 
a qualitative approximation of what seen in Fig. (|12p . 

For what concerns the folding process, we have already 
observed in Section IIIII that the time scales of equilibra- 
tion are orders of magnitude larger than those character- 
izing the first passage time through the native valley (i.e. 
the native minimum and the connected minima). In fact, 
according to our definition of the folding temperature, we 
can guess that over the equilibration time scale approx- 
imately 50% of the time is spent in the native valley, 
despite it contains a very small fraction of the minima 
(nodes) in the landscape (directed graph). In the renor- 
malized representation of the directed graph we expect 
that the quantitative determination of the average first 
passage time through the native valley (that is reduced 
to a single node, as a result of the renormalization of the 
minima connected to the native one) is preserved, pro- 
vided the graph is renormalized for temperatures T < Tf. 
For instance, we have verified that this is the case for SI 
at r = 0.04 where the folding time on the renormalized 
graph amounts to 1892 when averaged over 10^ paths. 
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FIG. 12: Absolute value of the components of the normalized 
eigenvectors w^^\ w^^'i and for the fast-folder SI at T = 
0.04. 



V. CONCLUSIONS AND PERSPECTIVES 

Many phenomena of biological interest are associ- 
ated with equilibrium and non-equilibrium processes in 
polypeptidic chains. A suitable description and under- 
standing of such processes is far from trivial in these 
complex structures. This is the main reason why in this 
manuscript we have decided to consider a sufhciently sim- 
ple and widely investigated model 8] for testing an effec- 
tive approach to these problems. In particular, we have 
analyzed two heteropolymers and one homopolymer in 
order to point out differences and analogies among var- 
ious typical polypeptidic sequences. In fact, one of the 
heteropolymer is known to behave as a " fast folder" , at 
variance with the other ones, which exhibit a much slower 
relaxation dynamics to their native states. 

As a first step we have described a strategy for recon- 
structing the energy landscape of these simple chains. 
The search for minima and first order saddles is per- 
formed by combining different algorithms aiming at a 
sufficiently careful reconstruction of the landscape close 
to the native minimum, up to values of the energy of the 
order of KsTf, where Tf denotes the folding tempera- 
ture. Since the number of minima and saddles increases 
with the energy, the computational cost is already quite 
high up to KsTf: going beyond this value is practically 
unfeasible. On the other hand, performing a more ex- 
tended search is not expected to add relevant informa- 
tion. In fact, we have checked that the main dynamical 
mechanisms associated with the folding process and with 
the relaxation to equilibrium are very well reproduced. 



even if most of the stationary points in the energy land- 
scape above KbTj are discarded. 

The following step amounts to exploit our knowledge of 
the relevant part of the energy landscape for reconstruct- 
ing a directed graph representation of the dynamics. Ac- 
tually, the temperature dependent molecular dynamics 
performed in the energy landscape can be replaced by a 
Markov-chain dynamics on a directed graph. The nodes 
of the graph correspond to the local minima of the energy 
landscape, while the first order saddles connecting such 
minima are represented by the links of the graph. The 
strength of the links is measured by the Langer's estimate 
[m of the hopping rates between connected minima. The 
directed nature of the graph is associated with the asym- 
metry due to the different energies of the two connected 
minima. We have shown that, for temperatures close or 
below Tf , MD simulations are essentially in good quanti- 
tative agreement with GD. Anyway, in general the latter 
systematically underestimate the former, mainly because 
it has not access to the poorly reconstructed portion of 
the energy landscape above KsTf. 

The main advantage of using a graph representation is 
that one can apply a renormalization procedure, which 
keeps the topological as well as the dynamical features 
of the model, while significantly reducing the dimension- 
ality of the graph as temperature is increased. Actually, 
the procedure described in Section IIVI allows to aggre- 
gate many of the original nodes into renormalized " hubs" , 
characterized by a high connectivity, while a great deal 
of nodes exhibits a weak connectivity. This effect is much 
more relevant in the heteropolymers, than in the ho- 
mopolymer, thus indicating that the topological proper- 
ties are quite different in the two cases. Anyway, this al- 
lows us to conclude that the homopolymer, thanks to its 
spatially homogeneous structure, exhibits peculiar topo- 
logical properties with respect to any disordered sequence 
of peptides. This notwithstanding, in the simple model 
considered in this paper we have evidence that topolog- 
ical properties are not sufficient for discriminating be- 
tween a "fast" and a "slow" folder, which are known to 
correspond to sequences SI and S2, respectively. This 
does not exclude that topological indicators should not 
be effective in more realistic models of proteins. Nonethe- 
less, we can affirm that dynamical properties analyzed 
in the MD, GD and renormalized GD provide a clear 
identification of the " fast folder" and are expected to be 
effective for the wide class of models of polypeptides con- 
sidered in the literature. In particular, we find that the 
average first passage time from the native configuration, 
which provides an estimate of the folding time, is at least 
two orders of magnitude smaller for the "fast folder" SI 
than for the other sequences. By comparing this time 
scale with the typical equilibration time scale associated 
with the inverse of the smallest eigenvalue of the Lapla- 
cian matrix of the renormalized graph we find that they 
are comparable for SO and S2, while for SI it is three 
orders of magnitude smaller. Considering that for SI at 
Tf more than 50% of the equilibrium measure is con- 
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centrated in the native valley, we can conclude that the 
dynamics of the fast folder spends a great deal of time ex- 
ploring configurations close to its native state, even dur- 
ing the transient to equilibration. This points out that 
the folding process can be viewed as a genuine noncqui- 
librium process, since the equilibration time scales are 
too large to be compared with experimental estimates of 
the typical folding time of a polypeptidic chain. 

We want to conclude by pointing out that the meth- 
ods described in this paper can be applied also to more 
realistic models of polypeptides and single-domain pro- 
teins. It is evident that the reconstruction of the relevant 
portion of the energy landscape associated with the fold- 
ing and equilibration processes may yield high computa- 
tional costs. Nonetheless, the renormalization procedure 
can provide an effective representation of the kinetics of 
these models, up to the time scales typical of the equili- 
bration process, which usually cannot be explored by the 
traditional MD approaches. 
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APPENDIX A: PROPERTIES OF THE MASTER 
EQUATION 

We now show some useful properties of the Master 
equation and of the eigenvectors of the Laplacian ma- 
trix. 

As a preliminary observation we note that, by sum- 
ming both sides of the Master equation over all the nodes 
of the graph and invoking the detailed balance condi- 
tion, one can easily verify that the total probability is 
conserved: 



gets 



dt 



N 



The normalization condition Pi{t) = 1 therefore holds 
at every t, which, as we will see later, induces some con- 
straints on the projections of realistic probability vectors 
on the eigenvectors of the Laplacian Matrix W. 

As already mentioned, W can be cast into a symmet- 
ric form trough a similarity transformation. It therefore 
admits a complete basis of orthogonal eigenvectors, each 
describing a different mode of decay to equilibrium. Be- 
sides orthogonality, the eigenvectors of W share the ad- 
ditional property that their components are zero-sum. In 



Summing over i each members and once again invoking 
detailed balance, leads to 



N 



(A3) 



The only eigenvector that defies this demonstration is 
ui*-^-* the null eigenvector defined in Equation (H^), which 
actually has positive components and can be normalized 
to unity. Using this normalization one can write: 



N 



(A4) 



i=l 



Actually the fact that eigenvector have zero sum is just 
a consequence of the fact that the master equation con- 
serves probability. Indeed, since eigenvectors form a com- 
plete basis, each probability distribution of initial con- 
ditions on the graph P(0) can be expressed as P(0) = 
X^j^i ajW^^'. It will then evolve in time according to 



N 

P{t) = E ajw'^^^ exp-''^* . 
Summing over the components of P{t) 

N N 

5^P.(0=Ea,<5i,,exp-'^^* = 

i=i j=i 



(A5) 



"0- 



(A6) 



fact, from the eigenvalue equation Ww, 



w. 



Hence, in order to have Pii^) = li Q!i must neces- 

sarily be 1. 



APPENDIX B: SPECTRAL CLUSTERING 

We now justify the use of reweighted components as an 
effective tool to uncover the inherent structure of eigen- 
vectors. We define reweighted components of a vector v 
on a graph as the ratio site-to-site of the vector com- 
ponent to the local value of the stationary probability: 
Vi — Vi /wf^ . We will here extend an argument origi- 
nally proposed for discrete graphs ^2d\ to weighted ones 
and show that, when a graph is divided into two weakly 
connected subgraphs A and B, the reweighted compo- 
nents of the first nonzero eigenvector assume only two 
possible values one for A and one for B. 

The Laplacian Matrix W can be written as the sum of 
two matrices: W = D — V^, where F is the transition rate 
matrix and D a diagonal matrix Dij — 6ij X^fcLi ^i,k- 

First of all, we consider the case in which the graph is 
composed of two disconnected subsets of nodes A and B. 
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In this case the Laplacian matrix will be referred to as 
W°. Since T^j = for each i G A and j e B, W° can be 



written as 



Daa 



^ AA 







D 



BB 



^ BB 



(Bl) 



Let now use the null eigenvector w^^^ of to construct 
two vectors e as follows: 








w. 



(1) 



i e 6 



(B2) 



of W'^ with a null eigenvector, and the same holds for 
any linear combination v = aw^ + bw^ . More generally 
when a graph is composed of n disconnected subgraphs 
the kernel of its Laplacian matrix is has dimension n. 

Let's now suppose that the two subgraphs A and B are 
not properly disconnected but do share a small number 
of connections. In this case the Laplacian matrix has the 
form W = W° + W^, with 



of the linear combinations of and w'^ . After some 
algebra reads 



1 (1) {i)r 



(B7) 

Since w'-^^ satisfies the detailed balance, and existing a 
connection from S to ^ for each connection from A to 
B, one has: 

E^^*'^r,,,=^«;«r,, VzeA (B8) 

jeB jeB 



It is easy to show that both and are eigenvectors Analogously: 



E r.,, = E r,- V* e B. (B9) 

j&A jeA 

We can therefore define two quantities a — 
EiG^tjGiS^i'f^w'f^rj-, and /3 = T,^eB,jeAW^^^'^f^^3.i 



which cast VF^ in a particularly simple form 



Dab 



^ AB 

-^BA DbA 



(B3) 



where T ab and Tjsa carry the information about connec- 
tions between A and i3, while Dab and Dqa carry the 
information on the effect these connections have on the 
diagonal of the Laplacian matrix. 

We now look for the eigenvectors of W among vectors 



of the form v 



bw 



Wv = (W" + W'^){aw^ + bw'^) 
= W\aw^ + bw'^). 



(B4) 



In other words if any eigenvector of W exists that is a 
linear combination of and it is also an eigenvector 
of W^. We will therefore look for the eigenvectors of this 
last matrix having the desired form aw^ + bw^ . 



For sufficiently small the vector W^{c 



bw'^) 



can be approximately written as a linear combination of 
and w'^ . To this purpose we introduce the projector 
on the space of the linear combinations of and w'^ 



^AB 



(B5) 



where (w'^, w'^) is a x 2 matrix whose columns are the 
two column vectors and . The vector W^{aw^ + 
bw'^) can now be written as: 

W^{aw^ + bw^) ~ liABW^iaw^ + bw^) = 



(B6) 

where we have introduced = n^gVF^II^B, a 2 x 2 
matrix that reproduces the effect of W"^ in the subspace 




W 



1 _ 
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(BIO) 



It is important to notice that, if the two subgraphs A are 
weekly connected a and (3 will be relatively small, since 
there are few connections such that Tj^i > for i ^ A 
and j e B. 

The two eigenvalues of VF^ are and a + (3, referring 
respectively to the eigenvectors 



and 



(Bll) 



It can finally be shown that backprojecting with Il^g this 
two eigenvectors of to the entire A^-dimensional space 
one obtains two eigenvectors of characterized by the 
same eigenvalues. According to Equation IB4I these are 
also eigenvectors of W. More precisely by this procedure 
one obtains 

• + w'^ which obviously coincides with w^^^ and 
is associated to the null eigenvalue also according 
to the perturbative calculation 

• u — aw^ — l3w^ , associated to the eigenvalue a + (3 
which is a small number and will therefore lay in 
the low end of the spectrum of W. 

It is now straightforward to verify that the reweighted 
coordinates of u get the values a for nodes belonging to 
A and — /3 for nodes belonging to B. In this sense the 
analysis of the reweighted coordinates of the eigenvec- 
tors of W can be employed as a spectral method for the 
identification of clusters, portion of the graph character- 
ized by a high degree of internal connectivity and a small 
number of connections with the rest of the graph. 
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